On the Cache Access Behavior of OpenMP Applications

نویسندگان

  • Jie Tao
  • Wolfgang Karl
چکیده

The widening gap between memory and processor speed results in increasing requirements to improve the cache utility. This issue is especially critical for OpenMP execution which usually explores fine-grained parallelism. The work presented in this paper studies the cache behavior of OpenMP applications in order to detect potential optimizations with respect to cache locality. This study is based on a simulation environment that models the parallel execution of OpenMP programs and provides comprehensive information about the runtime data accesses. This performance data enables a detailed analysis and an easy understanding of the cache operations performed on-line during the execution.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Tracing the Cache Behaviour of Data Structures in Fortran Applications

In an application, data access can become a major performance bottleneck if the memory hierarchy of the underlying hardware architecture is not taken into account. The only way to gain deeper insight of an applications memory usage is to measure its data access behavior with hardware counters. From the programmer’s point of view such performance data (like cache misses or hits) have to be linke...

متن کامل

Characterization of Multithreaded Scientific Workloads on Simultaneous Multithreading Intel Processors

Simultaneous Multithreading (SMT) is a technique that allows multiple independent threads to execute different instructions each cycle. Hyper-Threading (HT) is an implementation of SMT available on recent processors from Intel. Naturally, Multi-threaded applications are very suitable for SMT systems. However, HT due to extensive resource sharing may not suitably benefit OpenMP high performance ...

متن کامل

Performance Analysis of PC-CLUMP based on SMP-Bus Utilization

PC-CLUMP (Cluster of Multiprocessor) is one of the most cost-e ective commodity-based platforms for HPC applications. The increasing number of CPUs per SMP node realizes very compact system size and very low price on the network interface per processor keeping the number of CPUs in the system. However, the performance of SMP-bus on such an SMPPC node is relatively poor compared with that of SMP...

متن کامل

A Performance Model for OpenMP Memory Bound Applications in Multisocket Systems

The performance of OpenMP applications executed in multisocket multicore processors can be limited by the memory interface. In a multisocket environment, each multicore processor can present a performance degradation in memory-bound parallel regions when sharing the same Last Level Cache (LLC). We propose a characterization of the performance of parallel regions to estimate cache misses and exe...

متن کامل

Scheduling Dynamic OpenMP Applications over Multicore Architectures

Approaching the theoretical performance of hierarchical multicore machines requires a very careful distribution of threads and data among the underlying non-uniform architecture in order to minimize cache misses and NUMA penalties. While it is acknowledged that OpenMP can enhance the quality of thread scheduling on such architectures in a portable way, by transmitting precious information about...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004